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l-1-.'l Abstract 

^ ■ Inductive inference is a recursion-theoretic theory of learning, first developed by E. 

(/3 , M. Gold (1967). This paper surveys developments in probabilistic inductive inference. 

We mainly focus on finite inference of recursive functions, since this simple paradigm 
has produced the most interesting (and most complex) results. 

> 

(N 1 Introduction 

o 

Q I Understanding the process of learning has always fascinated scientists. There are several 

0^ ■ computational theories of learning. One of the oldest theories is inductive inference estab- 

lished by Gold|lH]. This theory considers the process of learning from a viewpoint of the 



O 



o\ 



O . computability theory. Unlike other theories of learning (for example, PAC-learning||3^, |20[|), 

inductive inference does not make probabilistic assumptions about the world. However, 
k><( , probabilistic algorithms appear in inductive inference and the study of probabilistic induc- 

^ ■ five inference creates a lot of interesting problems with elements of both computability theory 

- and combinatorics. In this paper, we survey some of these problems. 

We start with a general introduction to inductive inference. Learning can be considered as 
a process of gathering information about an unknown object, processing this information and 
obtaining a description of the unknown object. Ideally, we would like to obtain a complete 
description of the object. 

In the theory of inductive inference, objects are arbitrary recursive (computable) func- 
tions (or recursively enumerable languages). The reason is that any algorithmic behavior 
can be represented as a recursive (computable) function and, hence, we obtain a model that 

*Parts of this work were done at University of Latvia, supported by Latvia Science Council Grants 93.599 
and 96.0282 



includes any learning situation. Throughout this paper, we only consider learning of total 
recursive functions (except section |] where we consider recursively enumerable languages). 

The natural data about a function / are its values /(O), /(I), /(2), . . . and the natural 
representation of these data is the sequence (0, /(O)), (1,/(1)), (2,/(2)), .... The most 
general type of description for a computable function is a program in a universal programming 
language. Any other description can be converted to this form. 

This gives us the following learning model. A learning algorithm receives the values of 
an unknown function / in the natural order: (0, /(O)), (1, /(I)), (2, /(2)), . . . and produces 
a program h. The algorithm succeeds on / if the program h computes /. 

We will compare the classes of functions identifiable by probabilistic algorithms with 
different probabilities of correct answer. 

2 Definitions 

Next, we introduce the formal notation and definitions used in this paper. For more back- 
ground information, see [Q for recursive function (computability) theory, |^, ^ for set 



theory and |^ ^ for inductive inference. 

A learning machine is an algorithmic device that reads values of a function /: /(O), /(I), 
. Having seen finitely many values of the function it can output a conjecture. A conjecture 



is a program in some fixed acceptable programming system p6|, |33 . 

It makes sense to allow the learning machine to revise its conjecture. In this case, the last 
conjecture output by the algorithm should be correct but intermediate conjectures may be 
wrong. This increases the power of the learning algorithm. Also, this can be motivated by 
the fact that humans learning a complex behavior (for example, foreign language or driving), 
do not learn it completely at once. Rather, they first learn a part of it, then extend it by 
learning more. This model where an unlimited number of conjectures is allowed is called 
learning in the limit^^^ . 



Definition 1 Ifq/ 



(a) A deterministic learning machine M E\-identifies (identifies in the limit) a function 
f if, given /(0),/(l), ...it outputs a sequence of programs ho,hi,... such that, for 
some i, hi = /ij+i = /ij+2 = • • • and hi is a program computing f . 

(b) M EX-identifies a set of functions U if it EX-identifies all f eU . 

(c) EX denotes the set of all sets U that are EX-identifiable. 

A model where only one conjecture is allowed and it must be correct has been studied 
as well. This model is called finite learning. It is more restricted and (in most contexts) 
simpler. 



Definition 2 / (7^/ 



(a) A deterministic learning machine M finitely identifies (FIN-identifies) a function f if, 
receiving /(O), /(I), ... as the input, it produces a program computing function f . 

(b) M FIN-identifies a set of functions U if it FIN-identifies any function f eU. 

(c) A set of functions U is called FIN-identifiable if there exists a deterministic learning 
machine that identifies U. The collection of all FIN-identifiable sets is denoted by FIN. 

The problems that we consider are fairly simple for probabilistic learning in the limit 
(cf. 1^1], ^ ^) but are much more complicated (and more interesting) for finite learning. 
Therefore, in this survey, we focus on finite learning. References to work on other models of 
inductive inference are given in sections ^and ^ Next, we define identification by probabilis- 
tic machines. We define it for FIN but the definition carries over to EX and other paradigms 
as well. 

Definition 3 (a) A probabilistic learning machine M {p)FIN -identifies (FIN-identifies 
with probability p) the set of functions U if, for any function f E U the probability 
that M FIN-identifies f is at least p. 

(b) The collection of all {p)FIN -identifiable sets is denoted by {p)FIN. 

Team identification is another idea closely related to probabilistic identification. A team 
is just a finite set of learning machines {Mi, M2, . . . , M^}. 

Definition 4 (a) A team M [r,s]FIN -identifies the function f if at least r of learning 
machines Mi, . . ., Ms FIN -identify f . 

(b) The collection of all [r , s]F\N -identifiable sets is denoted by [r, s]FIN. 

It is easy to see that [r, s]FIN C (-)FIN. (Just choose one of the machines in the team 
uniformly at random and simulate.) In some cases, the opposite is also true and every 
probabilistic machine can be simulated by a team. 

The main goal of research in probabilistic inductive inference is determining how (p)FIN 
depends on the accepting probability p. Formally, it means describing the probability hier- 
archy. 

Definition 5 The probability hierarchy for FIN is the set of all points p such that there is 
U e {p)FIN but U i{p + e)FIN for e > 0. 



3 Explicit results for FIN 



Probabilistic FIN-identification was first studied by FreivaldsflJ]. He showed that any prob- 
abihstic learning machine with the probability of correct answer above 2/3 can be replaced 
by an equivalent deterministic machine. He also characterized the probability hierarchy for 
FIN between 1/2 and 2/3. 



Theorem 1 j^ 

(a) Ifp > 2/3, then (p)FIN = FIN. 

(b) {2/3)FIN^ FIN. 

(c) Ifn/{2n-l) >p> {n + l)/{2n + l), then {p)Fm = (n/(2n-l))FIN = [n,2n-l]FIN. 

(d) {{n + l)/(2n + 1))FIN ^ {n/{2n - 1))FIN. 

It also makes sense to consider probabilistic algorithms with the probability of correct 
answer 1/2 and below because there are infinitely many outputs and, hence, even designing 
an algorithm that gives the correct answer with probability e (for an arbitrarily small fixed 
e > 0) may be nontrivial. Here, the first result was a surprising discovery that a "2 out of 
4" team is more powerful than a "1 out of 2" team. 



Theorem 2 j^ 



(a) There is a set of functions U such that U e [2,4]FIN but U ^ [1,2]FIN. 

(b) [1,2]FIN= [3,6]FIN= [5, 10]FIN = ... and [2,4]FIN= [4,8]FIN= [6, 12]FIN = .... 

(c) (1/2)FIN= [2,4]FIN. 

We see that the power of a team depends not only on the ratio of machines that must 
succeed but also on the number of machines in the team. 
The next step was moving below probability 1/2. 

Theorem 3 [T^ Let po = ^ pi = ^, p2 = §, ps = §, p^ = §■ Then, for alii E {0,1,2,3} 

(a) For allx G]pi+i,Pi], (x)FIN = (pi)FIN, and 

(b) (p,)FIN ^ (p.+i)FIN. 

Each of these cutpoints was proven separately and there seemed to be no formula or 
unifying proof argument connecting them. It took several years to obtain a more general 
result. 



Theorem 4 ]71|/ The probability hierarchy for FIN in the interval [^, |] zs { 25^-34 !^^ — 

91 I I JM 20 17 15 271 
^J '-' I49' 41' 35' 31' 56J ■ 

Thus, it appeared that there was a formula ( 25^-34 ) ^"^^ ^ general argument for this 
interval. It only was obscured by exceptions from this formula at the beginning. With 
probabilities getting smaller, progress became more and more difficult. The full proof of 
Theorem ^ was more than 100 pages long. On the other hand, it only described the situation 
for the interval [^, |]- 

4 Explicit results for PFIN 

One of the approaches to this situation was considering Popperian FINite identification(PFIN), 
a restricted version of FIN. FIN allows two types of errors on functions that are not identified 
by a machine. These are 

1. Errors of commission. The program output by a machine M produces a value different 
from the value of the input function. 

2. Errors of omission. The program output by M does not halt on some input. 

Errors of omission are ones that cause most trouble. The reason is that, given a program h 
output by a machine M, we cannot tell whether h halts on input x. If we eliminate them, 
the model becomes simpler and still remains interesting. 



Definition 6 jl^/ A learning machine M is Popperian if it does not make errors of omission 
(i.e., if all conjectures on all inputs are programs computing total functions). 



Definition 7 (a) A set of functions U is PFIN-identifiable if there is a Popperian machine 
M that FIN -identifies U. 

(b) PFIN denotes the collection of all PFIN-identifiable sets. 

Probabilistic and team PFIN-identification are introduced similarly. It is important that 
the requirement about learners outputting only programs computing total recursive functions 
is absolute, i.e. 

1. All conjectures of all machines in a PFIN-team must be programs computing total 
recursive functions. 

2. A probabilistic PFIN- machine is not allowed to output a program which does not 
compute a total recursive function even with a very small probability. 



Daley, Kalyanasundarain and Velauthapillai [|1^, ^ proved counterparts of Theorems |I|, 
^ and 1^ for PFIN. The situation for probabihties greater than or equal to 1/2 was precisely 
the same as for FIN, only the proofs became simpler. For probabilities smaller than 1/2, 
two sequences of points where the power of probabilistic machines changed were discovered. 
The first, 9^33 ' started at ^ and converged to |. 



Theorem 5 JIl 

(a) The probability hierarchy for PFIN in the interval [i, 1] is { ^J^^ \n > 1}. 

(b) The probability hierarchy for PFIN in the interval [|, |] is {g^z^ln > 2}. 

The second sequence was more complicated. It was actually a union of three simpler 
sequences corresponding to three different ways how machines in a team can behave. 

Theorem 6 j^ The probability hierarchy for PFIN in the interval [|, |] is { ^^^"^ |n > 6} U 
{^|n>12}U{^|4<n<ll}. 

However, even for Popperian learning, things were getting more complicated as the proba- 
bilities decreased (this can be observed both by just comparing the sequences of probabilities 
in Theorems ^ and ^ and by looking at the arguments that were used to prove these the- 
orems). As result of that, the authors of [§ wrote that the prospects of determining all 
cutpoints are bleak even for the interval [2/5, 1/2]. 

5 General results for PFIN 

An alternative approach was proposed in |^ . Instead of trying to find all cutpoints explicitly, 
III] focused on studying the general properties of the whole probability structure. 

The first step was describing existing diagonalization constructions (i.e. constructions 
proving that there is t/ G (p)PFIN such that U ^ {p + e)PFIN for e > 0) in a general form. 

Theorem 7 H, \2^] Let Ppfin be the probability hierarchy for PFIN and pi, . . . ,ps G Ppfin- 
Let p G [0, 1]. // there are gi > 0, . . . , g^ > such that 

1- qi + q2 + ■■■ + qs=p; 

^- ^I^ =p^forl = l,...,s, 
then p G Ppfin- 

This led to a conjecture that Ppfin is equal to the set A defined as follows. 
1. leA 
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11 1 

4 3 2 



M ill I I I I 1 1- 



FIN 



12 24 1 3 2 
25 ■ ■ ■ 49 2 ■ ■ ■ 5 3 
m ill I 1 1 I — iiiiiiiii I — I 1 1 



1 



PFIN 



3 4 13 2 
7'''9''' 2 ' ' ' 5 3 
m ill I — m ill I I I IIIIIIIII I — I h- 



Figure 1: The probability hierarchies for EX, FIN and PFIN 

2. li pi,p2, . . . ,Ps & A and p G [0, 1] is such that there exist qi, . . . ,qs E [0,1] satisfying 

(a) q^ + q2 + . . . + qs = p; 

(b) ^-q^ =pi for i = l,...,s, 

then p & A. 

Indeed, A = Ppfin and the first step in proving that was observing some structural 
properties of this set. 

Definition 8 / p^ , ^dj] A set A is well-ordered if there is no infinite strictly increasing se- 
quence of elements of A. A set A is well-ordered in decreasing order if there is no infinite 
strictly increasing sequence of elements of A. 

Figure 1 shows the known parts of probability hierarchies for EX, FIN and PFIN. It 
is easy to see that all are well-ordered in decreasing order. The set A defined above is 
well-ordered as well. 

Theorem 8 /jl|/ The set A is well-ordered and has a system of notations. 



A system of notations is an algorithmic description for a well-ordered set. It allows 
to find preceding elements, given one element. This notion was introduced by Kleene for 
constructive ordinals|21] and extended to sets of reals (like A) in |]I[. Well-orderedness is 
crucial because it allows to use induction over elements of the set A. Having the system 
of notations is important to make this induction algorithmic. Using well-orderedness and 
system of notations, [|l| showed the following result. 

Theorem 9 JJ/ Let p & A and p' < p be such that there is no p" G A with p' < p" < p. 
Then, (p)PFIN = (p')PFIN. 



Corollary 1 /0/ A = Ppfin- 

This approach gives two other interesting results. 

Theorem 10 ^ The probability structure o/PFIN is decidable, i.e. there is an algorithm 
that receives two probabilities pi and p2 and answers whether (pi)PFIN = (p2)PFIN. 

Theorem 11 ^ Let p e Ppfin- Then, there is a k such that [pk, A;]PFIN = (p)PFIN. 

Thus, teams of different size can have different learning power (the counterpart of Theo- 
rem 0in PI) but we always have the "best" team size such that team of this size can simulate 
any probabilistic machine (and hence, team of any other size with the same success ratio). 

Finally, it is also possible to determine the precise ordering type of the probability hierar- 
chy. The table below shows how the complexity of the ordering increases when probabilities 
decrease. 



Interval 


Ordering type of the probability hierarchy 


1 1 


UJ 


|,1] 


2iu 


7' ^i 


3uj 


2 1 1 
5' '-i 


u' 


■i 1 


u' 


rl 1] 


UJ 


1 1] 


UJ^'" 


[0,1] 


eo 



ui is the ordering type corresponding to a single infinite sequence (2/3, 3/5, 4/7, . . .), koj 
is the ordering type of a set consisting of k infinite sequences, u"^ is the ordering type of 
a set consisting of infinite sequence of sequences and uj'^ is the ordering type of an infinite 



sequence of tu -type sets, oj'^ is the limit of uj, u , cj'^, . . .. Further ordering types can be 



defined similarly ||3J, |2^. The last one, eo, is the limit of 



and is considered to be so big that it is hard to find any intuitive description for itQ. This 
shows that the explored part of PFIN-hierarchy (the interval [|, 1], the ordering type 3u;) is 
very simple compared to the entire hierarchy. This result can be also considered as a partial 
explanation why it is unrealistic to find explicit values for all points in the probability 
hierarchy. 

6 General results for FIN? 

An easy corollary of results in [|l| is 

Theorem 12 //(pi)PFIN y^ (p2)PFIN, then (pi)FIN ^ (p2)FIN. 

Thus, any diagonalization argument that works for PFIN will work for FIN as well. This 
means that the probability hierarchy for FIN is at least as complicated as for PFIN. (In fact, 
it is more complicated because there are points (like 24/49) that are not contained in the 
PFIN hierarchy but appear in the FIN hierarchy.) 

There have been several attempts to move beyond explicit probabilities and to find general 
proof methods for FIN. Daley and Kalyanasundaram |p, |Tl| have developed a set of reduction 
arguments (techniques to reduce the problems about inclusions for smaller probabilities to 
already solved problems for inclusion at bigger probabilities). These arguments were essential 
to proving Theorem |^. They also were able to explain "the strange probabilities" of Theorem 
^. Yet, the number of cases they had to handle is huge and, for further progress, even more 
general techniques are necessary. 

Similar reduction arguments for PFINQ] were the foundation for general results of [Q. 
So, we may expect that methods of ^^ could serve as a foundation for similar results for 
FIN. 

Another approach was taken by P, |^, ^ who defined asymmetric teams, a generalization 
of usual teams. [Q showed that a more general result about asymmetric teams (well-quasi- 
orderedness) would imply well-orderedness and decidability of the FIN-hierarchy. They also 
claimed a proof of well-quasi-orderedness for asymmetric FIN teams. However, a bug was 
discovered in this proof and it turned out that asymmetric FIN-teams are not well-quasi- 
ordered[|]. This suggests that, if it is possible to prove a counterpart of results in section H 
for FIN, then the proof should use properties that are specific to traditional teams. 



^It is also known p^ that eo is the ordering type of the set of all expressions possible in first-order 
arithmetic but this docs not look very relevant to our inductive inference result. 



7 Other problems about FIN and PFIN 

Besides finding the outpoints, there are other problems about probabiUstic and team learning 
that are worth studying. One of them relates probabilistic and team learning to oracle 
computation. 

Assume that we have two teams (or probabilistic machines) and one of them is weaker 
than the other. If we allow the weaker team (probabilistic machine) to access some oracle 
(for example, K, the oracle for the halting problem), we increase the power of this team and 
it may be able to learn everything that the stronger team can learn. Kummer[2^|0 studied 



the following problem: given a, 6, c, d such that [a, 6]FIN ^ [c, rfjFIN, what is the class of 
oracles A such that [a,6]FIN C [c, c?]FIN[A]? ([c, (i]FIN[A] denotes the collection of sets of 
functions that are identifiable by a [c, (i]FIN-team with access to oracle A) 

We summarize his results in two theorems below. The first theorem partitions a, 6, c, d 
into such that [a,6]FIN C [c,ci]FIN[A] for some A and such that [a,6]FIN ^ [c, ci]FIN[v4] for 
all A. It also shows that, whenever A exists, the halting oracle K can be used as A. 

Theorem 13 /|^ 

1. If there is k e ^ such that ^ < \ < ^, then [m,n]FIN ^ [m', ra']FIN[v4] for any 
oracle A. 

2. If -^-^ < -^ and '^ < \ for some k, then [?7i, rijFIN C [r7i',n']FIN[A] for any A such 
that K <x A (i.e., the halting oracle K is Turing-reducible to A). 

The second theorem considers the question whether oracles weaker than K can be used 
as A in some cases. For this result, we need some extra definitions. 

Let Ml, M2, ... be an enumeration of all Turing machines and (pi be the partial function 
computed by Mj. 

Definition 9 PA denotes the set of all oracles A such that given A, there is a function 
f{x,y) that is computable with the oracle A and: 

1- U^xiy) = onp^iy) = 1, then f{x,y) = (p^iy). 

2. Otherwise, f{x,y) can be anything but it must be defined (even if ip^ijj) is undefined). 

In other words, an oracle in PA can be used to extend any partial recursive function to 
a total recursive function that is consistent with the original function. Equivalently, PA can 
be defined as the set of all oracles A that are Turing-equivalent to a complete and consistent 



extension of Peano arithmetic p^, O, E9|. If K reduces to A, then A G PA. However, the 



converse is not true 1 29]. 



^Related problems about oracles have been also studied in M, ^ 
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Theorem 14 /|^ 

1. Letm,n be such that [m,n]PFIN g [m',n']PFIN. Then, [m,ra]PFIN C [m',n']FFm[A] 
if and only if [m, n]FIN C [m', n']FIN[y4] if and only if K <t A. 

2. [24,49]FIN C [1,2]FIN[A] if and only if A e PA. 

Thus, we see that a weaker oracle may suffice because there are A G PA such that K is 



not Turing-reducible to A [^. Kummerp^ asked whether these two possibilities (we need 
an oracle A such that K <t A or any A G PA suffices) are the only ones. We have a partial 
answer 0. 

Definition 10 PA' denotes the set of all oracles A such that given A, there is a function 
f{x,y) that is computable with an oracle A with the following properties: 

1- UVxiy) = orip^iy) = 1, then f{x,y) =Vx{,y)- 

2. Otherwise, f{x,y) can be anything but it must be defined (even if ip^ijj) is undefined). 

3. If, for some x, there is at most one y such that ^x{y) = 1? then f{x, y) = 1 for at most 
one y. 

Theorem 15 j^ For any a, b, c, d such that [a, 6]FIN ^ [c, (i]FIN, the set of oracles A such 
that [a,6]FIN C [c, c?]FIN[74] is one of the following: 

1. The empty set. 

2. The set of all A such that K <t A. 

3. PA (see definition ^. 

4. PA' (see definition \T^). 



It is easy to see that PA C PA' C [A : K <t A}. However, we do not know whether 
PA' coincides with PA, {A : K <t A} or is different from both of them. 

Other properties of the probability hierarchy deserve studying as well. For example, IQ 
asked how close are points of the PFIN-hierarchy one to another. 

Question 1 /^ Is it true that there is a constant c > 1 such that every interval [x,y] C 
[i, j^] with y — X > (^)"' contains at least one point from the PFIN-hierarchy? 

A similar question can be asked about FIN. 
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8 Other paradigms of inductive inference 

Similar problems can be studied for other paradigms of inductive inference (besides FIN). 
One of the most interesting open cases is probabilistic language learning in the limit. In 
language learning, the object to be learned is a recursively enumerable language L. The 
standard presentation for the language is a text. 



Definition 11 /ji] 

(a) A text T for a language L is an enumeration (in any order) of all words in L. 

(b) A learning machine M TxtEx-identifies a language L, if given any text T for L as an 
input, it outputs a sequence of grammars gi,g2, ■ ■ ■ such that gi = (yfj+i = (7J+2 = • • • 
and gi recognizes L, for some i G IN. 

(c) A set U of languages is called TxtEx-identifiable (identifiable in the limit) if there is 
a machine M that TxtEx-identifies every L & U. TxtEx denotes the collection of all 
TxtEx-learnable sets. 

Note that M does not get any information about the words not in L. This is the biggest 
difference between inductive inference of functions and languages. If f{x) ^ y, M knows 
that after receiving f{x). If a word x is not in L, M never knows it because this may be the 
case that x E L but it has not appeared in the input yet. 

The probability hierarchy for TxtEx has been studied by Jain and Sharma|jl8[. Below, 



we summarize their main results. The first theorem concerns the probability at which a 
probabilistic machine becomes stronger than a deterministic one. Similarly to PFIN or FIN, 
it is 2/3. However, the similarities end once the next point in the probability hierarchy is 
revealed. It is 5/8 (instead of 3/5). It remains open what are the next points below 5/8. 

Theorem 16 /|7^ 

(a) Ifp > 2/3, then (p)TxtEx = TxtEx. 

(b) [2,3]TxtEx^TxtEx. 

(c) 7/5/8 <p< 2/3, then (p) TxtEx ^ [2,3]TxtEx. 

(d) [5, 8] TxtEx ^ [2, 3] TxtEx. 

The second theorem concerns relationships between teams of different size at probability 
1/2. Teams of different size may have different learning power (similarly to FIN or PFIN). 
Also similarly to PFIN and FIN, all [n, 2n]-team sizes for odd n were equivalent. However, 
for even n results were no longer the same as for FIN or PFIN. 



Theorem 17 /j7|/ 
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1. [l,2]TxtEx^ [2,4]TxtEx. 

2. [1, 2]TxtEx = [3, 6]TxtEx = [5, 10]TxtEx = . . .. 

3. For all i > 0, [2\ 2''+i]TxtEx ^ [2'+\ 2^+2]TxtEx. 

4. For all k, [A;, 2A;]TxtEx ^ (l/2)TxtEx. 

In both cases, we see that there both similarities with FIN and differences. We think that 
it can be very interesting to study this hierarchy. However, before general results are proved, 
it may be necessary to get a better knowledge of explicit probabilities and to accumulate 
more proof techniques. 

9 Conclusions and related work 

We surveyed the work in probabilistic inductive inference, with an emphasis on recent work 
for FIN and PFIN. For good surveys about earlier results, see [|], ^ for inductive inference 



in general and pq , |16| for probabilistic inductive inference. 

The biggest challenge in the area remains obtaining general results about the probability 
hierarchy for unrestricted FIN. In section p, we mentioned several approaches to this prob- 
lem. None of them has been completely successful but there is a chance that these ideas 
can be extended, giving more insight about unrestricted FIN. There are other interesting 
problems about FIN that deserve studying as well (like FIN with oracles, section ^). 

Besides FIN, probability hierarchies for other learning models can be studied. Probabilis- 
tic language learning |T^ in the limit is the most interesting among those. Other recently 
studied models are probabilistic language learning with monotonicity restrictions |^3 and 



probabilistic learning up to a small set of errors [^ 
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